2022/09/29

One factoral design

Research question: Does happiness differ between the Swiss regions?

Data

Hypotheses

Hypothesis 1: The respondents from the 7 regions reported different mean happiness levels.

Hypothesis 2: Respondents from Espace Mittelland reported higher mean happiness levels than Zentralschweiz.

Look at the data - numerical

Nuts2 n mean trimmed10 median sd var skew kurt
Région lémanique 570 2.91 2.91 3 0.92 0.85 0.17 3.58
Espace Mittelland 720 2.72 2.72 3 0.89 0.79 0.36 3.01
Nordwestschweiz 427 2.70 2.70 3 0.88 0.78 0.47 4.05
Zürich 513 2.76 2.76 3 0.97 0.95 0.59 3.93
Ostschweiz 433 2.73 2.73 3 0.95 0.90 0.75 4.56
Zentralschweiz 275 2.61 2.61 3 0.85 0.73 0.62 4.00
Ticino 160 2.91 2.91 3 0.98 0.95 1.16 6.14

Look at the data - graphical

Box plots are excellent to display distributions.
Why are they not a good choice in case?

Look at the data - graphical

WARNING: depending on the bin size histograms can be misleading.

Look at the data - graphical

Quantile-Quantile-plots are a great way to compare the sample distribution to a theoretical distribution. Ideally, the points would match the line.

Why do we see a stair pattern?

Look at the data - graphical

add some random noise (normal [0, 0.5])

Analysis - parametric

Omnibus

oneway.test(H1~Nuts2,var.equal=FALSE, data=df_1f)
## 
##  One-way analysis of means (not assuming equal variances)
## 
## data:  H1 and Nuts2
## F = 5.2507, num df = 6.0, denom df = 1030.9, p-value = 2.434e-05

Levine and Hullett (2002) recommend ω² or η² as effect size for ANOVAs.

  • partial η² (used by SPSS) strongly depends on the variability of the residuals
  • η² biased e.g. when n is small or there are many levels
aov(H1~Nuts2, data=df_1f) %>% effectsize::omega_squared(verbose=F) %>% toTable()
Parameter Omega2 CI CI_low CI_high
Nuts2 0.0079776 0.95 0.002278 1

Hypothesis 1: The respondents from the 7 regions reported different mean happiness levels. –> Null-Hypothesis can be rejected, but the effect is negligible

Digression: Effect size


The binning of effect sizes are just rules of thumb and somewhat arbitrary.

Cohen (1992) Field (2013)
very small < 0.02 < 0.01
small < 0.13 < 0.06
medium < 0.26 < 0.14
large >= 0.26 >= 0.14


When rating the effect size, consider the customs of your (sub-)domain and, more importantly, the size of other known effects on your dependent variable.

The R package effectsize includes various rules to help with the interpretation.

effectsize::interpret_omega_squared(0.008, rules = "cohen1992")
## [1] "very small"
## (Rules: cohen1992)

Analysis - parametric

Contrasts

The typical way of testing Hypothesis 2 ( Espace Mittelland happier than Zentralschweiz) is with a linear contrast (but this is NOT the recommended way).

f1_emm <- lm(H1~Nuts2, data=df_1f) %>% emmeans::emmeans('Nuts2', data=df_1f)
emmeans::test(
  emmeans::contrast(f1_emm, list(ac1=c(0, 1, 0, 0, 0, -1, 0))),
  adjust='none')
##  contrast estimate     SE   df t.ratio p.value
##  ac1         0.114 0.0651 3091   1.753  0.0797

Note 1: This analytic contrast tests a distinct hypothesis; hence no p-adjustment is needed. Comparisons without specific hypotheses (e.g., orthogonal contrasts) would need an adjustment of the significance level (e.g., False Discovery Rate)

Note 2: This analytic contrast is 2-sided, but H2 is 1-sided -> p needs to be halved

BUT: Linear contrasts are very sensitive to variance heterogeneity. Jan & Shieh (2019) recommend Welch’s t-test instead.

Analysis - parametric

Contrasts

Perform Welch’s t-test

df_1f %>%
  filter(Nuts2 %in% c('Espace Mittelland', 'Zentralschweiz')) %>%
  t.test(H1~Nuts2, data=., alternative='greater')
## 
##  Welch Two Sample t-test
## 
## data:  H1 by Nuts2
## t = 1.866, df = 513.61, p-value = 0.03131
## alternative hypothesis: true difference in means between group Espace Mittelland and group Zentralschweiz is greater than 0
## 95 percent confidence interval:
##  0.01333942        Inf
## sample estimates:
## mean in group Espace Mittelland    mean in group Zentralschweiz 
##                        2.725000                        2.610909

Analysis - parametric

Contrasts

Get effect size

df_1f %>%
  filter(Nuts2 %in% c('Espace Mittelland', 'Zentralschweiz')) %>%
  mutate(Nuts2 = forcats::fct_drop(Nuts2)) %>%
  effsize::cohen.d(H1~Nuts2, data=.)
## 
## Cohen's d
## 
## d estimate: 0.1299925 (negligible)
## 95 percent confidence interval:
##        lower        upper 
## -0.009234415  0.269219496

Hypothesis 2: Respondents from Espace Mittelland reported higher mean happiness levels than Zentralschweiz.

–> the Null-Hypothesis can be rejected, but the effect is negligible

Analysis - parametric

Post-Hoc Analysis

Games-Howell Modification of the Tukey Test

Works with unequal samples sizes and heterogeneity of variance.

rstatix::games_howell_test(df_1f, H1~Nuts2, conf.level = 0.95, detailed = FALSE)
## # A tibble: 21 × 8
##    .y.   group1            group2      estimate conf.…¹ conf.h…²   p.adj p.adj…³
##  * <chr> <chr>             <chr>          <dbl>   <dbl>    <dbl>   <dbl> <chr>  
##  1 H1    Région lémanique  Espace Mit… -0.189    -0.339 -0.0388  4   e-3 **     
##  2 H1    Région lémanique  Nordwestsc… -0.218    -0.389 -0.0482  3   e-3 **     
##  3 H1    Région lémanique  Zürich      -0.158    -0.328  0.0130  9.2 e-2 ns     
##  4 H1    Région lémanique  Ostschweiz  -0.182    -0.358 -0.00555 3.8 e-2 *      
##  5 H1    Région lémanique  Zentralsch… -0.303    -0.494 -0.113   6.31e-5 ****   
##  6 H1    Région lémanique  Ticino      -0.00779  -0.264  0.249   1   e+0 ns     
##  7 H1    Espace Mittelland Nordwestsc… -0.0294   -0.189  0.130   9.98e-1 ns     
##  8 H1    Espace Mittelland Zürich       0.0313   -0.129  0.191   9.97e-1 ns     
##  9 H1    Espace Mittelland Ostschweiz   0.00710  -0.159  0.173   1   e+0 ns     
## 10 H1    Espace Mittelland Zentralsch… -0.114    -0.295  0.0669  5.04e-1 ns     
## # … with 11 more rows, and abbreviated variable names ¹​conf.low, ²​conf.high,
## #   ³​p.adj.signif

Analysis - parametric

Post-Hoc Analysis


Pairwise Welch t-tests with alpha adjustment

pairwise.t.test(df_1f$H1, df_1f$Nuts2, data=df_1f, pool.sd=TRUE, p.adj="fdr")
## 
##  Pairwise comparisons using t tests with pooled SD 
## 
## data:  df_1f$H1 and df_1f$Nuts2 
## 
##                   Région lémanique Espace Mittelland Nordwestschweiz Zürich 
## Espace Mittelland 0.00171          -                 -               -      
## Nordwestschweiz   0.00171          0.69941           -               -      
## Zürich            0.01678          0.69105           0.43712         -      
## Ostschweiz        0.00797          0.92449           0.69105         0.75808
## Zentralschweiz    0.00015          0.13945           0.34979         0.07963
## Ticino            0.92449          0.06290           0.04002         0.13636
##                   Ostschweiz Zentralschweiz
## Espace Mittelland -          -             
## Nordwestschweiz   -          -             
## Zürich            -          -             
## Ostschweiz        -          -             
## Zentralschweiz    0.14054    -             
## Ticino            0.08487    0.00644       
## 
## P value adjustment method: fdr